Data-driven multiscale dynamical framework to control a pandemic evolution with non-pharmaceutical interventions

Before the availability of vaccines, many countries have resorted multiple times to drastic social restrictions to prevent saturation of their health care system, and to regain control over an otherwise exponentially increasing COVID-19 pandemic. With the advent of data-sharing, computational approaches are key to efficiently control a pandemic with non-pharmaceutical interventions (NPIs). Here we develop a data-driven computational framework based on a time discrete and age-stratified compartmental model to control a pandemic evolution inside and outside hospitals in a constantly changing environment with NPIs. Besides the calendrical time, we introduce a second time-scale for the infection history, which allows for non-exponential transition probabilities. We develop inference methods and feedback procedures to successively recalibrate model parameters as new data becomes available. As a showcase, we calibrate the framework to study the pandemic evolution inside and outside hospitals in France until February 2021. We combine national hospitalization statistics from governmental websites with clinical data from a single hospital to calibrate hospitalization parameters. We infer changes in social contact matrices as a function of NPIs from positive testing and new hospitalization data. We use simulations to infer hidden pandemic properties such as the fraction of infected population, the hospitalisation probability, or the infection fatality ratio. We show how reproduction numbers and herd immunity levels depend on the underlying social dynamics.


Introduction
The fast spreading COVID-19 pandemic has destabilized the world during the past two years, forcing most countries into alternating periods of confinement and deconfinement. These sequential measures have attenuated the disease progression that otherwise would have increased exponentially, and thus prevented health care systems from getting destabilized [1][2][3][4]. For  how the national lockdowns at March 18, 2020 and October 30, 2020 have stopped the exponential pandemic growth (Fig 1). With the availability of vaccines the pandemic seems under control if the population is progressively vaccinated. However, it remains the serious threat that current vaccines are not efficient to handle the constant emergence of new mutations, and the pandemic might get out of control again [5]. If no vaccines are available, for example at the beginning of a pandemic, monitoring and modeling tools are essential to predict and curtail the pandemic growth together with its consequences on the health care system [6]. During the early phase of the COVID-19 pandemic when vaccines were not available, the past experience has shown that surveillance, testing and severe social restrictions (non-pharmaceutical interventions, NPIs) attenuated the viral spread, which prevented hospitals from becoming overloaded. However, these restrictions not only came at a heavy economical cost [7], they also heavily disturbed social, school and academic life. Designing efficient social measures that not only curb the pandemic and exonerate the health care system, but also minimize social and economic impacts remains challenging [8]. In presence of a fast exponential pandemic growth characterized by a large reproduction number R 0 � 3, NPIs have to be taken as soon as possible because a small delay of only a few days can already significantly affect the hospital load [9].
Despite of these modelling efforts, it remains challenging to make reliable predictions during the early phase of a new pandemic due to incomplete knowledge, constant changes in social behaviour, or unexpected events. To cope with these challenges, we propose a datadriven dynamical framework that constantly adapts to new data in order to provide reliable near future predictions that can be used to control the pandemic evolution (Fig 2). We calibrate the framework with a large variety of data (Fig 2A): web accessible hospitalisation statistics (Fig 1), clinical data from the Bichat hospital in Paris (Fig 6), serology and positive testings ( Fig. S2A in the S1 File), and surveys about social interaction matrices (Fig 3D and Eq. 6 in the S1 File). We implemented an automated feedback and inference methods to constantly update model parameters to new data (Fig 2A and 2B), and to subsequently readjust near future predictions of the pandemic and hospitalisation evolution (Fig 2C and 2D). To perform dynamical simulations of the pandemic and hospital evolution, we devise a spatially coarse grained and time discrete compartmental model (Fig 3). Besides the generic susceptible compartment, we have compartments that characterize the infection status of an infected person, for example asymptomatic, symptomatic, hospitalised or recovered (Fig 3C). We stratify the population according to age (Fig 3B), and we omitted persons older than 80 because many of them live in retirement homes where social interactions are different from the contact matrices used here. Beside the calendrical time, we introduce the compartmental time that counts the time since an infected person is in its current status, for example the time since an asymptomatic person became infected, or the time since a symptomatic person developed symptoms (Fig 3A). We use transition probabilities that depend on the compartmental time to model the infection dynamics characterized by switchings between compartments. This is different from SIR type of models, where the dynamics is approximated with exponential transition rates. We use is used to predict the pandemic and hospitalisation situation P(t|t 0 ) at future times t > t 0 . (D) As time progresses, a checkpoint procedure constantly confronts the predictions P(t|t 0 ) to the data D(t). If at some time t the discrepancy between model and data becomes larger than a threshold T, a recalibrated model M(t) is derived using the data D(t). The model M(t) is then used to generate new predictions P(t 0 |t) for times t 0 > t. Based on these predictions, social measures and strategies are adjusted. https://doi.org/10.1371/journal.pone.0278882.g002

PLOS ONE
clinical data from a large Parisian hospital to compute hospital transition probabilities (Fig 6). We implement fitting procedures to estimate changes in contact matrices due to NPIs from the data for the number of new hospitalisations (Fig 4). We calibrate the model at national level in France until February 2021, since in this work we do not consider the impact of vaccinations and new viral strains. We use the causal dynamical simulations to study pandemic properties inside and outside hospitals.

Web-available epidemiological and testing data, and hospital specific clinical data
To calibrate model parameters we use data from several sources, as described below.
Age-stratified hospitalisation data from the French governmental website. Age-stratified hospitalisation data is provided by the French Government (www.data.gouv.fr) since March 18, 2020. We used the databases donnees-hospitalieres-covid19 and donnees-hospitalieresclasse-age-covid19 to obtain daily number of patients hospitalised H(n), in intensive care (ICU),  Table 1. The new hospitalisation data (diamonds) in (B-D) is from Fig 1. (A) Schematic of the sequential calibration procedure. The data for period k+1 ([t k , t k+1 ]) is used to fit the new contact matrix c a;a 0 (t k ) at time t k that accounts for new social measures that are put into effect at t k . The model prediction P(t|t k ) (red curve) computed with the unchanged contact matrix before deviates from the data for t > t k . The prediction P(t|t k+1 ) (green curve) is computed with the new contact matrix.  in hospitals D(n), and recovered from hospitalisations R(n) (Fig 1). To reduce fluctuations we smoothed the data with a Gaussian filter. We computed the daily number of new hospitalisations as newH(n + 1) = H(n + 1) − (H(n) − R(n) − D(n)).
Age-stratified testing data from the French governmental website. Age-stratified results from COVID-19 testings in laboratories and hospitals is provided by the French Government (www.data.gouv.fr) via the database donnees-relatives-aux-resultats-des-tests-virologiques-covid-19. The age stratified number of positive tested persons is shown in Fig. S2 in the S1 File. We also used published data about the prevalence of viral mutations (donnees-de-laboratoires-pour-le-depistage-indicateurs-sur-les-mutations).
Age-stratified clinical data from the Bichat hospital. The Bichat hospital in Paris provided anonymized and age-stratified data collected between March 2020 and January 2021 that displays the hospitalisation history of 586 patients who all received ICU treatment. The data does not contain patients in the age group 0-19 years old. Statistics extracted from this data are shown in Fig 6. The clinical data comes from the OutcomeRea database that was declared to the Commission Nationale de l'Informatique et des Libertés (#999,262), in accordance with French law, and this study was approved by the institutional review board of Clermont Ferrand. Informed consent was not required because the study did not modify patients management and the data were anonymously collected.

Data based feedback algorithm to control the pandemic and hospitalisation evolution
We use a variety of data to update and calibrate model parameters (Fig 2A): web accessible hospitalisation statistics (Fig 1), clinical data revealing hospital procedures (Fig 6), testing results ( Fig. S2 in the S1 File, and social contact matrices (Eq. 6 in the S1 File). We implement an automated feedback algorithm where predictions are compared to new data, new data is used to recalibrate parameters, and recalibrated parameters are used to generate improved predictions. The feedback procedure consists of three steps (Fig 2B-2D): 1. In the first step, the model M(t 0 ) is calibrated with the data D(t 0 ) up to time t 0 ( Fig 2B). To adjust parameters we implemented fitting procedures. If new social measures are put into effect at time t 0 , the contact matrix is adjusted such that P(t|t 0 ) takes into account the expected impact of these measures.
2. In the second step, the model M(t 0 ) is used to forecast the pandemic and hospitalisation evolution P(t|t 0 ) for times t > t 0 (Fig 2C).
3. As time progresses, a checkpoint procedure constantly confronts the prediction P(t|t 0 ) to new data D(t) to verify whether the prediction is still conform with reality ( Fig 2D). If at time t the discrepancy between prediction and data becomes larger than a threshold T, With the model M(t) new predictions P(t 0 |t) are computed for times t 0 > t. If the new predictions are no longer conform with the expectations, this might be an early indication that social measures have to be readjusted.

Multiscale modeling framework
The modeling framework that we developed implements the feedback algorithm specified above. The model is multiscale because it depends on calendrical and compartmental time, and on data acquired at national and single hospital level. Mathematical details are given in the S1 File. We developed a spatially homogeneous compartmental model with a discrete time-resolution of one day ( Fig 3A). We distinguish between 5 age groups ( Fig 3B) and 5 infection compartments ( Fig 3C). We use transition probabilities between these compartments to compute the disease progression ( Fig 3C). Social interactions are modelled with time-dependent contact matrices that account for national political interventions ( Fig 3D).
Multiple time scales to compute the disease progression. We distinguish between the calendrical time n that measures the number of days since January 1, 2020, and the compartmental time l that measures the number of days since an infected person joined his current infection compartment (Fig 3A). The total time m since infection can be computed by adding the times spent in each compartment. For asymptomatic persons in compartment j = 1, the times m and l are identical. Transition probabilities might not only depend on the times n and l, but also on the time m since infection. However, since our hospital data only species the times l after hospitalisation (see Fig 6), we use here only the times n and l for the modelling.
Population divided into five age-stratified groups. We classify the population into 5 age groups labelled by k = 1, . . ., 5 ( Fig 3B): 0-19, 20-49, 50-59, 60-69 and 70-79 years old. The population in age group k is N k = N tot p k , where N tot = 67 millions and p k = (24%, 36.3%, 13.1%, 11.9%, 10.2%) is from https://www.statista.com/statistics/464032/distribution-population-agegroup-france. We do not consider persons older than 80 because many of them live in retirement homes where the interaction dynamics and contact matrix is very different from the rest of the population. For this fragile age group we suggest implementing a separate model.

Disease progression characterized by five infection compartments.
Besides the susceptible population, to characterise the infected population we consider 5 infection compartments and transition probabilities that specify the disease progression ( Fig 3C). In the current implementation the transition probabilities depend only on the time l. A newly infected person starts in the asymptomatic compartment j = 1. From there it can either develop symptoms and become symptomatic (switching to j = 2), or remain asymptomatic and eventually recover. For a symptomatic person the disease either further deteriorates such that hospitalisation is needed (switching to j = 3), or it will recover without hospitalisation. A hospitalised patient in compartment j = 3 can be transferred to ICU (switching to j = 4), or it will eventually recover and be released from the hospital. In ICU, a patient can either die, or be transferred back to normal hospital (switching to j = 5). Finally, a patient in hospital compartment 5 will eventually be released and join the recovered population. In total, the model distinguishes between four types of recovered persons.
Infection dynamics leading to new hospitalisations with non-pharmaceutical interventions (NPIs). Since the number of new infections is hidden, and the number of positive testings was not reliable enough throughout the beginning of the pandemic [27], we decided to use the data for the daily number of new hospitalisations to calibrate the infection dynamics leading to new hospitalisations (Fig 1). We neglect the number of new infections that are possibly generated in hospitals. As a consequence, we can independently calibrate the model leading to new hospitalisations, and the model for the hospital dynamics. We use the model for new hospitalisations as input to calibrate the hospitalisation parameters of the model (see section 3.2).
In the following we discuss the procedures that we implemented to calibrate the model leading to new hospitalisations.
• As initial condition for the pandemic we fitted a seed of new infected persons in the agegroup 20-49, since person from this age group are likely to have introduce the virus in France due to travailing. We fixed January 1, 2020 as initial date for simulations. We verified that changing this date does not much affect simulation results at later times because it can be compensated by changing the value of the initial seed.
• Initially, a new infected person is in the asymptomatic compartment j = 1. When the infection manifests itself with symptoms, the asymptomatic person switches to the symptomatic compartment j = 2. We assume that the now an alerted person takes precautions to avoid infecting others. Thus, in our implementation only infected persons belonging to the asymptomatic compartment spread the disease. We therefore consider a non-zero infectiousness only for the asymptomatic compartment (see Eq. 4 in the S1 File). However, the model is general such that a non-zero infectiousness for any compartment can be implemented, for example, to include new infections generated by symptomatic persons. We use an incubation period of 5 days [39]. Since we do not have data for the evolution of the viral load after infection, we simplify and assume that a person is uniformly infectious within the time period l = 5 − 11 days after the infection (Eq. 4 in the S1 File). We checked that modifying the duration of the infectiousness period does not significantly alter results because there is compensation with the fitted value of the overall infectivity parameter β.
Because the relation between serial time, generation time and infectiousness is not obvious, especially with a large fraction of asymptomatic transmissions [40,41], we did not use the serial time distribution from [42] as a surrogate to estimate the infectiousness distribution (but see [33]). We simplified and assumed that an infected person can develop symptoms with uniform probability within the period l = 6 − 11 days after the infection [43]. This gives two days (l = 5, 6) where an asymptomatic person transmits the disease before symptoms can appear (often referred to a pre-symptomatic transmission). Note that SIR models have to introduce a pre-symptomatic compartment to model an infected person that is asymptomatic but infectious. In contrast, with our time scale l, we can incorporate a time dependent infectiousness for the asymptomatic compartment without having to introduce a pre-symptomatic compartment.
• We fitted the age dependent probabilities to remain asymptomatic. To reduce the number of fitting parameters, we consider that the age groups 2-4 have the same probability to remain asymptomatic. For age group 1, we constrained the fitting range to values between 30% and 80%, for age groups 2-4 to 30% and 50%, and for age group 5 to 20% and 40%.
• Based on findings that the viral load is independent of age [43][44][45], we assumed that the infectiousness is independent of age. However, since it is unclear whether children and adolescents have the same infection dynamics as adults [46], we allowed that the susceptibility of group 1 to become infected is different from the other groups. We fitted the susceptibility of group 1 with the constraint that it is reduced by maximally 50% compared to the other groups (see Eq. 7 in the S1 File).
• We fitted the age-stratified probabilities for a symptomatic person to become hospitalised ( Fig. S1 in the S1 File). To constrain the fitting, we first fitted the data for new hospitalisations from the number of positive testings (Fig. S2 in the S1 File) to extract age stratified probabilities that a positively tested person becomes hospitalised: we obtained 0.6%, 1.5%, 4.5%, 10.5%, and 22.8%. We used these values as lower limits for the probability that a symptomatic person becomes hospitalised, since there is also the possibility that persons have been tested positive multiple times, for example due to travelling, or that a person become hospitalised without being tested before. Due to reduced travelling and social interactions, we assumed that a person in the age group 70-79 is tested only when showing symptoms. We therefore reduced the fitting parameters by using the value of 22.8% for group 5 as a fixed anchor (Fig. S1A in the S1 File). We further reduced fitting parameters by assuming that the distribution, which specifies at which day l a persons becomes hospitalised after showing symptoms, is age independent (Fig. S1B in the S1 File).
• We modified contact matrices to account for multiple governmental interventions that affected social interactions. Table 1 specifies the dates when new social restrictions have been put into effect (entries marked with bold font), or restrictions have been relieved (entries marked with italic font). We added two additional dates beginning of August and October to account for the beginnings of summer vacation and new academic year. The dates in Table 1 correspond to checkpoint times t k where contact matrices were modified (all other model parameters were kept unchanged). We divided the past year into consecutive time periods that are delimited by these checkpoint times. For example, period 1 ([t 0 , t 1 ]) corresponds to the time from January 1, 2020 until March 18, 2020 when the first national lockdown was imposed, and period 2 ([t 1 , t 2 ]) is the lockdown period until May 11, 2020. At each time t k we fitted a new contact matrix c a;a 0 (t k ) (15 fitting parameters) using the new hospitalisation data for period k+ 1 ([t k , t k+ 1 ]) ( Fig 3A).
• We extracted the contact matrix during period 1 before the first lockdown in March 2020 from [38] (see Eq. 6 in the S1 File). We fitted the contact matrix for the lockdown period (period 2). To constrain the lockdown contact matrix parameters (15 parameters), we made the following assumptions about the impact of the lockdown measures (see also [3,47]): contacts between group 1 and groups 4 and 5 are reduced by at least 90%, since grandparents could no longer be visited; due to school closure, the intra-group contact of group 1 is reduced by at least 70%; all the remaining contact parameters are reduced by at least 40%.
• Since new hospitalisation data was not available for period 1, we simulated period 1 and 2, and we used the new hospitalisation data from period 2 for the fitting. We fitted the unknown infection parameters (initial seed, fraction asymptomatic, infectivity, susceptibility of age group 1, probabilities for symptomatic to become hospitalised; see also the paragraph further down where we summarize the fitted values for the infection parameters) and the lockdown contact matrix. Our inference methods are such that any model parameter can be fitted. To reduce the number of consecutive fitting parameters, we adopted a recursive fitting procedure: We first generated a lockdown contact matrix that we used as input, and we fitted the infection parameters by minimizing the mean squared error between simulation and new hospitalisation data for period 2. Then we used the fitted infection parameters as input, and in a second round we fitted the lockdown contact matrix with the new hospitalisation data for period 2. Then we used the fitted lockdown contact matrix as input and we refitted the infection parameters, etc. We recursively performed this procedure until the change in the mean squared error became lower than a chosen threshold value.
• For the time after the lockdown (after May 11, 2020), we fitted only contact matrices and we kept the infection parameters unchanged. Since masks were largely unavailable before May 11, the pandemic declined during the first lockdown as a consequence of a reduction in social contacts. In contrast, with the availability of masks, the infection dynamics further depends on a reduced infectivity of social interactions due to mask wearing [48]. However, because a reduction in the infectivity parameter β(n) due to masks has the same effect as a uniform reduction of the contacts matrix (Eq. 10 in the S1 File), we adopted the following fitting procedure after May 11: 1) we kept β(n) constant and fitted only contact matrix parameters; 2) if the social measures had a deconfining character, we used the preceding contact matrix as lower limit, otherwise as upper limit. Consequently, the fitted contact matrices after from May 11, have to be considered as effective matrices that also comprise the effect of mask wearing. With this method we sequentially calibrated the model to the new hospitalisation data until February 2021 (Fig 4D; fitting results are solid lines, diamonds are data; vertical red dashed lines indicate confining events, green dashed lines deconfining events). To visualise these changes, we defined the transmission index t k ¼ P a 0 �a c a;a 0 ðt k Þ P a 0 �a c a;a 0 ð1Þ , and we show how this value evolved due to mask wearing and social measures (Fig 4E). The index is normalized to one for the initial uncontrolled phase of the pandemic. The strong reduction of the transmission index by almost 80% during the first lockdown after March 18, 2020 is entirely due to contact reductions, whereas the values at later times comprise the combined effect of mask wearing and contact reduction. Without changing the contact matrix at March 18, 2020 the pandemic would have continued to grow exponentially (Fig 3B, solid curves). In contrast, with the lockdown contact matrix the model reproduces the pandemic decline during the lockdown (Fig 3B, dashed curves). As another example that shows the impact of contact matrix changing, we consider October 30 where a second national lockdown has been imposed that lasted officially until November 28. However, we could not satisfactorily reproduce the new hospitalisation data by assuming the same contact matrix between October 30 and November 28 (Fig 3C, solid curves). Instead, a much better fit was obtained by adapting the contact matrix already around 10 days before the official end of the lockdown (Fig 3C, dashed curves), suggesting that the social interaction dynamics started to change already before the official end of the lockdown.

Summary of fitted infection parameters.
The initial seed of infected in age group 2 at January 1, 2020 is 11. The fraction of asymptomatic is 80%, 46.15%, 46.15%, 46.15%, and 20%. The values 80% and 20% correspond to upper and lower limits. The susceptibility of age group 1 was reduced by 37% compared to the other groups. The value for the infectivity β(n) was 2.63 × 10 7 (Eq. 7 in the S1 File). This value determines the exponential growth during the initial phase, and depends on the normalisation of the contact matrix. The probability that a symptomatic person becomes hospitalised is given in Fig. S1 in the S1 File.

Results
Although there is regional heterogeneity of the infection spread throughout France [3,32,49], we adopted a coarse grained approach to model the pandemic evolution at national level. We split the model into two parts: In the first part we model the infection dynamics leading to new hospitalisations, and in the second part we use the new hospitalisations to predict the hospital workload.

Infection dynamics with multiple social interventions
Multiple governmental interventions during the past year affected social interactions, which we accounted for by adapting the contact matrix of the model (see Materials and methods). The latest contact matrix accounts for the social measures from January 16, 2021. We did not derive contact matrices for the time after January 20201 because the model does not include the effect of vaccinations and viral mutations. We simulated the pandemic evolution during 2021 to show the consequences of a scenario with persisting social restrictions from January 2021 ( Fig 5; the vertical dashed lines indicate that the model has been calibrated to data until February 15, 2021).
The simulated number of new infected per age group exhibit three prominent peaks in March 2020, October 2020 and January 2021 when the pandemic was not under control ( Fig  5A). Interestingly, the various age groups contribute very differently to new infections: the dominant contributions come from age groups 0-19 and 20-49 due to their large number of interactions (contact matrix, Eq. 6 in the S1 File), whereas infected persons in the age group 60-79 contribute only little to spread the disease (Fig 5A). The simulation reveals that with the social restrictions from January 2021 the pandemic would have declined during 2021 if no viral mutations had appeared.
From the number of new infections, we computed the age dependent fraction of infected population (Fig 5B). At May 11, 2020 around 5% of the population had been infected, which increased to around 13.6% until January 16, 2021 (Fig 5B, yellow curve). However, whereas by January 16, 2021 only around 5% of the population in the age group 70-79 has been infected ( Fig 5B, magenta), this value increases to around 18% for the age group 0-19 (black).
To characterize the pandemic growth, we computed age-stratified effective reproduction numbers R k (n) that represent the mean number of new infections that will be generated by a person that becomes newly infected at day n in age group k (Eq. 12 in the S1 File). By averaging over age groups we further defined the mean reproduction number R(n) (Eq. 13 in the S1 File). We found that the reproduction numbers R k (n) are strongly age and time dependent (Fig 5C). This reflects the impact of multiple social measures and the age stratification of interaction frequencies. For example, the reproduction number for age group 70-79 is always smaller than one due to the low interaction frequency, whereas it is almost always above one for age group 0-19 (Fig 4C, magenta vs black curve).
To characterise how new infections are distributed among age groups, we defined the reproduction distribution RD k (n) that specifies the fraction of new infections generated in age group k (Eq. 14 in the S1 File). We found that more than 80% of new infections are generated within group 1 and 2 ( Fig 4D).
Because the reproduction numbers R k (n) provide only information about the number of infections that will be generated by an infected person from group k, they do not reveal whether the new infections in group k will decline or increase. To characterise these changes, we introduced reproduction growth indices RGI k (n) (Eq. 15 in the S1 File) that measure the change in new infections per age group (Fig 4E). A positive values indicates that the pandemic grows. Interestingly, although the reproduction number for age group 0-19 is larger than one during 2021, the growth index for this group is negative. This shows that infected persons in this age group spread the disease to other groups.  (C) Age group specific effective reproduction numbers R k (n) (Eq. 12 in the S1 File). The value R k (n) gives the average number of new infections that will be generated by a person that becomes newly infected at day n in group k. The yellow curve represents the mean reproduction number R(n) (Eq. 13 in the S1 File). (D) Reproduction distributions showing in which age group new infections will be generated (Eq. 14 in the S1 File). (E) Reproduction growth indices (Eq. 15 in the S1 File). A positive value indicates that the number of new infections will increase in the specific age group.

PLOS ONE
Data-driven multiscale dynamical framework to control a pandemic with non-pharmaceutical interventions Finally, we computed the Infection Hospitality Ratio (IHR) (by dividing the cumulative number of new hospitalisations with the cumulative number of new infections from (A) Fig  5G, continuous lines). We further computed analytic IHR values (Fig 5G, dashed lines) by multiplying the probability for infected persons to become symptomatic with the probability for symptomatic to become hospitalised (the probabilities are specified in the Materials and Methods), and found the age group dependent values 0.12%, 0.92%, 3.3%, 5.6%, and 18.2%. Thus, an infected person in age group 70-79 is around 150 times more likely to become hospitalised compared to a person in age group 0-19. Interestingly, the transient plateau values in Fig 5G before March 18,2020 differ from the analytic predictions during the exponential growth phase because of the time delay between new infection and new hospitalisation.

Hospitalisation dynamics
Predicting the hospital workload from the number of new hospitalisations requires statistics at single hospital level. Because such data is not provided by the French governmental websites, we used age stratified clinical data from the large Bichat hospital in Paris to extract hospital transition probabilities (Fig 3). With the clinical data, we derived age stratified distributions specifying the day l at which patients are transferred to ICU, leave ICU, die in ICU, and are released from hospital (Fig 6A-6D, dashed lines). We further extracted the probability that a patient dies in ICU (Fig 6E). Due to the small sample size per age group, the distributions show large fluctuations (Fig 6A-6D, dashed lines). We fitted the data with the discrete Gamma distribution (Eq. 26 in the S1 File) to smooth the curves, which reveal more clearly difference between age groups (Fig 6A-6D, solid lines). However, we found that using the smooth or fluctuating distributions had only a minor impact on predicting the hospital workload (see next paragraph). Finally, we computed the average number of days patients spend in compartments j = 3, 4, 5 before making transition to a new compartment (Fig 6F).
In the subsequent step, we applied the transition probabilities from Fig 6 to simulate the hospital workload in France based on the evolution of new hospitalisations (Fig 5F). This implicitly assumes that procedures in the Bichat hospital are not too different from the national average. Unfortunately we could not extract the probability to be transferred to ICU, the distribution when patients without ICU treatment are released from hospital, and there was no data for age group 0-19. To compensate for this missing data, we proceeded as follows: we used for age group 0-19 the distributions for age group 20-49 shown in Fig 6A-6D, since we expect similarities between these two age groups, and since the exact shape of the distribution has only a minor impact compared to the overall switching probability (note that distributions are normalised to one); the remaining unknown probabilities were estimated by fitting for each age group the hospitalisation data for France until February 2021 (Fig 1). We concurrently fitted the evolution of patients hospitalised, in ICU, that died in ICU, and released from hospital. For example, for the age group dependent probabilities to be transferred to ICU, we obtained the values 15.2%, 19.5%, 26.0%, 35.7%, and 31.4%.
Finally, we simulated the hospital workload from January 2020 until December 2021, which we compared to data from Fig 1 (Fig 7A-7D, simulations (solid lines) versus data (diamonds)).

PLOS ONE
Data-driven multiscale dynamical framework to control a pandemic with non-pharmaceutical interventions The simulated new hospitalisation from Fig 5F were used as input. With hospital parameters from the Bichat data we could satisfactorily reproduce most of the hospitalisation dynamics in France until February 2021. The simulated number of deceased persons in the age group 70-79 is significantly reduced compared to the France data (Fig 7C), which might indicate that the Bichat hospital has lower probability to ie in ICU compared to the national average. We could have reduced this discrepancy by fitting the probability to die in ICU for this age group. We obtain larger discrepancies between simulation and data for the number of hospitalised patients (Fig 7A) due to differences in the daily number of recovered persons that accumulate over time (Fig 7D). The simulation further suggest that the social contact restrictions from January 2021 would have been sufficient to end the pandemic had no new viral strains emerged. Lastly, we computed the age dependent Infection Fatality Ratio (IFR) by dividing the cumulated numbers of deceased persons and new infections (Fig 7E, solid lines). The analytical values 0.0005%, 0.019%, 0.24%, 0.81%, and 2.62% are obtained by multiplying the probabilities to become symptomatic with the ones for symptomatic to become hospitalised, for patients to be transferred to ICU and to die in ICU (Fig 7E, horizontal dashed lines). For the mean IFR ratio we find a value of 0.2%. Remarkably, the probability to die after infection in the age group 70-79 is more than 5000 times higher compared to the age group 0-19.

Discussion
During the early phase of a pandemic when vaccines are not available, social restrictions are key to prevent an exponential growth and a possible collapse of the health care system. In such conditions, realistic models based on a large variety of data are needed to timely test and optimize NPIs in a constantly changing pandemic environment. In this work, we developed a data-driven multiscale modelling framework together with methods to continuously adapt parameters to new data in order to make reliable near future predictions (Fig 2). Key features of the framework: 1) age-stratified compartmental model with two time scales, calendrical and compartmental time (Fig 3); 2) non-exponential transition probabilities that depend on the compartmental time (Fig 3); 3) use of clinical data to extract hospital transition probabilities (Fig 6); 4) development of fitting methods to infer model parameters and to adjust contact matrices to NPIs (Methods and S1 File); 5) regular checkpoint procedures that accommodate the model to new data and adjust near future predictions (Fig 4).
To show the relevance of our work, we calibrated the framework to study the COVID-19 pandemic at national level in France with multiple social interventions until February 2021. We did not consider data beyond this time because the model does not include vaccinations and multiple viral strains, which changed the pandemic evolution afterwards. We implemented 5 age groups, 5 infection compartments, one compartment for deceased and 4 for recovered persons (Fig 3B and 3C). Besides the calendrical time n, we consider the number of days l since an infected person joined its current compartment (Fig 3A). Since transition probabilities are functions of the compartmental time l, we do not implement compartments like exposed, paucisymptomatic or prodromic, which are necessary in SIR type of models based on exponential switching rates. For example, instead of having an exposed compartment, we simply set the infectivity parameters to zero for the time l < 5 after infection (Eq. 4 in the S1 File). We classified non-hospitalised infected people into asymptomatic and symptomatic, similar to the dual classification from [50]. The difference between asymptomatic and symptomatic persons is that the latter are aware of their infection and take precautions to avoid infecting others. Infected persons with unnoticed symptoms therefore belong to the asymptomatic compartment. Consequently, we consider that only persons belonging to the asymptomatic compartment spread the disease, in agreement with the prevalent role of undocumented and silent transmissions [23, 28,51,52].
We developed constrained fitting procedures to infer model any unknown model parameter from data. We calibrated the infection parameters that are needed to simulate new hospitalisations by fitting data for new hospitalisations (Fig 4D) and data for positive testings (Fig. S1 in the S1 File). We sequentially fitted changes in contact matrices as a function of the underlying NPIs until February 2021 by minimizing the mean squared error between new hospitalisations data and simulation (Fig 4D). For the age groups 20-69 and 70-79 we fitted that 46% and 20% of the infections remain asymptomatic, respectively, values that are within the large published range of 15% to 60% [3,26,31,43,[53][54][55]. In contrast, for the age group 0-19 we fitted a surprisingly high value around 80% due to the following reasons: 1) the age group 0-19 is involved in around 42% of all contacts, comparable to 57% for age group 20-49 (Eq. 6 in the S1 File); 2) the probability for a symptomatic person in the age group 0-19 to become hospitalised is around 0.6%, which is reduced only by a factor 3 compared to the age group 20-49 (Fig. S1 in the S1 File); 3) new hospitalisations in the age group 0-19 are reduced by a large factor 16 compared to 20-49 (Fig 1). As a consequence, to obtain such a drastic reduction in the number of new hospitalisations, the probability to remain asymptomatic has to be much higher in the age group 0-19 compared to 20-49. A much higher asymptomatic fraction for children and adolescents is in agreement with more asymptomatic and mild infections, and fewer symptoms [56][57][58][59][60]. We further fitted a 37% reduced susceptibility to infections in age group 0-19, compatible with a reduced susceptibility for children and adolescents [57,61,62]. We tested scenarios where the susceptibility to infections is independent of age, and found that the fraction of asymptomatic in the age group 0-19 needs to be even higher around 90% (not shown). In summary, these results indicate profound differences in the disease progression between children and adults.
The simulations revealed that by May 11, 2020 around 5% of the French population had been infected, a value that increased to around 13.6% by January 16, 2021 (Fig 5B, yellow  curve). These values are consistent with 5.7% and 14.9% obtained with a completely different method based on serological data and a deconvolution procedure [49]. We estimated that the fraction of infected population is strongly age dependent: by January 16, 2021 only 5% of the population has been infected in the age group 70-79, contrary to 17% in the age group 0-19, similar to the values from [49]. We find that effective reproduction numbers are strongly age dependent (Fig 5C), which is often ignored [8]. Since reproduction numbers depend on social interactions, there is no absolute value for the level of herd immunity [12,32]. For example, with the contact matrix from January 2021 the herd immunity level would be around 21% ( Fig  5B, yellow curve). The effective reproduction numbers R k count the number of infections that a person in age group k will generate (Eq. 12 in the S1 File), but they do not specify how these infections are distributed among age groups, and whether the infection level in a specific group will decline or increase. To quantify these properties, we introduced a reproduction distribution RD k (Eq. 14 in the S1 File) and a reproduction growth index RGI k (Eq. 15 in the S1 File). By computing RD k we found that more than 80% of new infections are generated within age groups 0-19 and 20-49 ( Fig 4D). The RGI k values revealed that the infection level can decline in an age group (Fig 5E, black curve) although the reproduction number is above one (Fig 5C, black curve). We computed infection hospitality ratios (IHR) and found the age group dependent values 0.12%, 0.92%, 3.3%, 5.6%, and 18.2% (Fig 5G), comparable to estimations from [19,49]. These values reveal that a person in age group 70-79 is around 150 times more likely to become hospitalised compared to person in age group 0-19.
We simulated the hospital workload with the simulations for new hospitalisations as input (Fig 5E). Since the French governmental websites do not provide data at single hospital level to quantify hospital procedures, we used clinical data from the Bichat hospital in Paris to estimate hospital transition probabilities (Fig 6). Assuming that the Bichat hospital can serve as a template for France, and we applied these probabilities to model the hospital data at national level.
Hereby we ignored that there are regional differences for the health care system in France [63][64][65]. Because the Bichat data was incomplete to extract all hospital parameters, we quantified the remaining ones by fitting the national data until February 2021 (Fig 7A-7D). With this mixed procedure we could satisfactorily reproduce the hospitalisation dynamics (Fig 7). We further tested a different method where we did not use the Bichat data, but we fitted all hospital parameters. As expected, this yielded a better agreement with the national data and slightly different parameter values. For example, the fitting procedure yielded a slightly higher probability to die in ICU for age group 70-79, which would have reduced the discrepancy between data and simulation for this age group in Fig 7C. Nevertheless, since we think that the use of clinical data is the appropriate way to estimate hospital parameters, we show in Fig 7 the results obtained with the Bichat parameters. In future work, more adapted parameters for the national level will have to be estimated using clinical data from various hospitals and regions in France. Assuming that infection parameters would not have changed after February 2021, we extended the simulation until the end 2021. This revealed that the social contact restrictions from January 2021 would have been sufficient to end the pandemic if no new viral strains had emerged. By combining the infection and hospitalisation parameters, we computed age group specific IFR values of 0.0005%, 0.019%, 0.24%, 0.81%, and 2.67%, with a mean 0.2%, consistent with results from [19,22,51,54,66,67]. The COVID-19 mortality is strongly age dependent, for example, an infected person in the age group 70-79 has a more than 5000 times higher decease probability compared to a person in the age group 0-19.
There are several possibilities to extend and improve this framework in future work. To comply with the actual pandemic situation, it will be necessary to extend the model to handle vaccinations and multiple viral strains. It is conceptually straightforward to include vaccinated persons by further stratifying the population besides the age criteria. To handle multiple viral strains that have different infection and hospitalisation properties, separate models for each strain have to be calibrated and then coupled to make predictions. To better account for spatial inhomogeneities, separate models at national, regional or single hospital level could be implemented, or a spatially resolved social dynamics could be considered [25,68]. Mobility and contact network data could be used to further constrain the fitting of contact matrices, and to acquire a more precise understanding of how NPIS correlate with changes in contact matrices [30]. For example, to design NPIs that control a future pandemic, one could generate artificial new hospitalisation fluxes that do not saturate hospitals, then use the fitting methods to estimate the contact matrices that generate such fluxes, and finally identify the NPIs that result in such contact matrices. Since in this work we focused on the calibration procedure, we used a cost function that is based on the mean squeed error between data and simulations for the fittings. However, by choosing a different cost function one can design NPIs that achieve specific aims, for example, minimizing the number of new infections for elderly persons, or reducing economic cost. In this work we focused on methods, and we did not include an uncertainty analysis for inferred parameters similar to [33]. However, our constrained fitting methods that concurrently optimize many parameters goes much beyond a classical sensitivity analysis where individual parameters are changed separately. Finally, the ultimate goal will be to develop a comprehensive framework and software to control and manipulate the current pandemic, but also to provide a ready-to-use template that can be quickly activated in case of a new pandemic when vaccinations are not yet available.

Conclusion
To conclude, we developed a dynamical and data-driven modelling framework with the aim to control a pandemic evolution with non-pharmaceutical interventions within a constantly evolving social and pandemic environment. We use the calendrical time to model the overall pandemic evolution, and the time since infection to compute the individual disease progression. We used clinical data to quantify hospitalisation procedures, and we developed calibration and fitting procedures to continuously adapt model parameters to an evolving environment. The final goal is continuously improve this framework in order to obtain a calibrated modelling framework that can be used with new virus to make reliable predictions of the near future pandemic and hospitalisation evolution. This will allow to optimize policies in order to maintain control of the viral spreading and the workload of the health care system.